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Abstract 


We introduce a novel three stepwise model of adaptive e-learning using multiple learner 
characteristics. We design a model of a learner attributes enlisting the study domain, summary 
details of the student and the requirements of the student. We include the theories of learning 
style to categorize and identify specific individuals so as to improve their experience on the online 
learning platform and apply it in the model. The affective state extraction model which extracts 
learner emotions from text inputs during the platform interactions. We finally pass the system 
extracted information the adaptivity domain which uses the off-policy Q-learning model free 
algorithm (Jang et al., 2019) to structure the learning path into tutorials, lectures and workshops 
depending on predefined constraints of learning. Simulated results show better adaptivity 
incases of multiple characteristics as opposed to single learner characteristics. Further research 
to include more than three characteristics as in this research. 


Keywords: reinforcement learning, adaptive learning, learner characteristics. 


1. Introduction 


Increase in learner enrolment has forced higher education institutions to look for 
effective ways in which they can reach many learners. One of the strategies deployed by higher 
education institutions is the use of e-learning. According to Steinbacher and Hoffmann (2015) this 
involves inclusion of digital tools for learning delivery and employing technology to enable 
learning to take place without limitation of time and location. The study by (Hadullo et al., 
2018)also mentioned inadequate academic staff to facilitate online learning, poorly designed and 
course materials that are not interactive as challenges faced by learners in online learning. There 
is therefore a need to improve the quality of e-learning. One advantage that face to face learning 
has over online learning is the ability of learners to get immediate feedback and clarifications on 
areas they are facing difficulties (Linecar & Marchbank, 2020). Effort in research has gone into 
personalizing learning by making learning management systems adaptive (Sethi & S Lomte, 2017). 
In the effort to improve the quality of e-learning to cater for learners needs, researchers have made 
and developed adaptive learning systems. 


Research in adaptive learning goes back in the1990s. During that time researchers 
were looking at two major areas: hypertext and user modelling (Ennouamani & Mahani, 2017). 
Research in adaptive learning has grown ever since. One of the major areas of researches in 
adaptivity in e-learning at current is in learner modelling (Premlatha & Geetha, 2015). Chrysafiadi 
and Virvou (2013), and Raj and Renumol (2021) listed the different approaches for modelling 
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learner characteristics as follows: overlay; stereotyping; perturbation; machine learning 
techniques; cognitive theories; constraint-based models; fuzzy learner modelling; Bayesian 
networks; and ontology-based modelling. According to Chrysafiadi and Virvou (2013), learner 
modelling is the foundation for adaptive learning. 


Several researchers have used different techniques to develop the learner’s models. 
These techniques have been classified into two: static and dynamic methods. Static methods 
involve collecting information about learners by having them fill out a questionnaire. According 
to El Aissaoui et al. (2018) this method does not lead to accurate detection of learner’s learning 
styles as learners are usually un aware of their learning styles. For dynamic methods learner 
characteristics are collected while learners interact with the system (Ennouamani & Mahani, 
2017). To create learners’ models, many researchers are now focusing on the use of dynamic 
techniques. 


Sethi et al. (2017) identified the needs to be filled in adaptive learning systems as: ways 
of identifying and confirming learning styles; automatic learning styles identification process 
improvement; improving agents guiding the learner during the learning; ability to tract learning 
behavior; and basing adaptive LMS on learner assessment. 


2. Literature review 


This chapter introduces adaptive learning, e-learning, a review of learning theories 
and their relations to adaptive learning. Next it reviews researches on learner characteristics in an 
e-learning environment, e-learning models based on various learner characteristics. The chapter 
also gives the overview of Artificial Intelligence (AI) techniques used in adaptive e-learning 
systems. 


2.1 Theories of learning in adaptive e-learning 


Significant advancements of technology birthed with it, tools, environments and 
procedures for aiding learning and brought in a number of changes in learning environments and 
the way people learn keep on changing or will be made to change in conformity with the merging 
trends and technological issue. However, Havard et al. (2016) advocates that the implementation 
of technology in learning should not be in isolation but be driven by the way people learn. In this 
section, we review some of the learning theories that has been fronted to enhance adaptive 
learning by various researchers. 


Quite a number of learning theories have been fronted by various researches in order 
to address the online-learning. From Hadullo et al. (2017), the following learning theories have 
been looked into and proposed as theories which can make e-learning effective; the social 
constructivism, the theory of network, the cognitive load theory and the connectivism. 


Constructivism, behaviorism and cognitivism are the main learning theories that have 
been the building stones for the learning and instruction process. Some researchers glide deeper 
and bring suggest specificity; Hammad et al. (2018) advocates for adaptive systems to based 
principles of constructivist, behaviorist and cognitivist on the higher scale. 


According to Dalgarno (2001), a constructivist envisions learning being the 
knowledge construction process by building understanding based on past experiences and inputs 
making shift in focus from teaching to guiding learners so that the learners themselves construct 
knowledge. In Behaviorism learning is viewed as a response to external stimuli from 
environmental state-actions reinforcement activities so as to achieve the set specific objective. 
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Cognitivism, relates learning to a computer process as it defines learning as a process of acquiring, 


storing and retrieving information. 


Table 1. Learner characteristics and theories of learning in adaptive e-learning systems 















































Authors Studied Learner | Theory 
characteristics 
(Almohammadi & Hagras, 2013b) Learner knowledge Cognitivism 
(Deeb et al., 2014) Learning style Cognitivism 
(Fenza et al., 2017) Learner knowledge Cognitivism 
(Kolekar et al., 2010) Learning style Cognitivism 
(Rajendran et al., 2018) The learner’s Affective states Cognitivism 
(Malpani, 2011) The Learner’s Prior knowledge | Cognitivism 
and current knowledge level. This 
was done by measuring the ability 
of the learner to answer quizzes 
correctly 
(Sabourin et al., 2011) Learner effect Behaviorism 
(C. H. Wu et al., 2017) Learner knowledge Cognitivism 
(Alshammari et al., 2015) Learning style Cognitivism 
(Whitehill & Movellan, 2018) Learner knowledge Cognitivism 
(Hwang et al., 2013) Learning style Cognitivism 
(Yang et al., 2016) Both the learner’s Learning style | Cognitivism 
and cognitive styles 
(S. Y. Chen et al., 2016) Cognitive style, gender differences | Cognitivism 
(C. M. Chen & Li, 2010) Learner context Constructivism 

















From the studies as show in the various researches tabled in Table 1, most if not all of 
the adaptive e-learning systems failed to incorporate the three learning theories. The common 
theories among the studies is cognitivism and all the studies are mono-theoretical as learning 
theories are concerned. 


For those studies that based their adaptivity in e-learning systems based on theories 
of learning, they just utilized one aspect of the learning theories. The aspect of how learners 
process information was the most utilized aspect of cognitivism. There is therefore a need for 
adaptive e-learning systems to be based on the whole principles of the learning theory so that we 
can tell if the outcome was because of basing the system on a particular learning theory. 


Since most of these concepts of how learning occurs are build based on the weakness 
of the preceding concepts, there is need of combining the learning theories principles when 
building adaptive e-learning systems in order to be able to explain learning properly 


2.2 Learner characteristics and adaptive learning systems 


Most adaptive systems have succeeded in most cases where their profound abilities 
have been based on the accuracy in assessing general and specific learner characteristics 
(Colchester et al., 2017). This is what informs learner modelling to bring out the adaptation based 
on learner characteristics. Deciding which learner characteristic to be part of the learner model is 
usually a challenge (Nurjanah, 2008). Learner characteristics can be static or dynamic. The 
classification is applicable during modelling. Static learner characteristics include such objects 
such as name, age, email which do not change during the actual leaning or simulated learning. The 
collation of such are done through applicable questionnaires customized backed with both front 
end and back end for such data. Adaptivity in e- learning system may be classified dichotomously 
as static and dynamic. Dynamic characteristics of the learner include such features that are 
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acquired as a result of interactions with the environment, which complicates their modelling 
specificity rather than modelling their applicability (Chrysafiadi & Virvou, 2013). 


Table 2. Summary of learner characteristics used in different adaptive systems studies 





Authors and Title Learner characteristics 
(D. Wu et al., 2015) A fuzzy tree matching-based personalized Preferences 
E-learning recommender system 








(Tadlaoui et al., 2018) A learner model based on multi-entity Learner knowledge 
Bayesian networks and artificial intelligence in adaptive 
hypermedia educational systems 

(Alshammariet al., 2014) Adaptivity in E-Learning Systems Learning style, learner 
knowledge and learner 
preferences were found 
to be the most used 
learner characteristics 
in the learner model 
(Kanimozhi, n.d.) Emotional behavior 

An Adaptive E-Learning Environment Centered On Learner’s 
Emotional Behavior. 











(Almohammadi & Hagras, 2013b) Learner knowledge 
An Adaptive Fuzzy Logic Based System for 

Improved Knowledge Delivery within Intelligent eLearning 
Platforms 





(Deeb et al., 2014) Learning style 
An Adaptive HMM Based Approach for Improving 
E-Learning Methods 





(Ennouamani & Mahani, 2017) An overview of adaptive e-learning | Source of adaptation 
systems (learner, device, 
environment) 














2.3 E-Learning models 


Learner models inform the foundations of adaptation in e-learning (Ding et al., 2018). 
Various models have been explored, developed and aligned to help model varied learner 
characteristics. Rabat (2016) considered Andragogy and self-directed learning, adult learning 
theories, to come up with a learning adaptive e-learning model. They encompassed in their 
modeling, prior knowledge, affective states, personality traits, cognitive characteristics, personal 
characteristic and knowledge. Mejia et al. (2017) considered people with disabilities in their model 
and so their setup consisted of demographic data, competencies, reading difficulties, and cognitive 
traits. Mejia et al. (2017) did not consider any learning theory. Huang et al. (2017) placed the 
learner I a contextual environment and modelled learner’s context with regard to social context, 
cognitive levels, basic information, learners learning style, learner preferences and related 
interests. Ding et al. (2018) also considered fundamental initial information, the learner’s style of 
learning, the learner’s cognitive abilities and the learner’s prior knowledge state in their model. 
From Huang et al. (2017) and Ding et al. (2018) in their models did not take into account any 
learning theory. 
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Table 3. Summary of existing learner models 














NO | Authors Characteristics in the Theoretical 
model underpinning 
1 (Rabat, 2016) Towards an Adult Learner Personal, cognitive, Andragogy and self- 
Model in an Online social, personality traits, | directed learning 
Learning Environment emotional traits, 
cognitive and knowledge 
2 (Mejia et al., 2017) Inclusive Learner Model | Demographics, No learning theory 
for Adaptive Recommendations in Virtual competences, reading 
Education difficulties, learning 
style, cognitive traits 
3 (Huang et al., 2017) Research on Basic Learner No learning theory 
Individualized Learner Model Based on information, learner 
Context-awareness cognitive levels, learner 


learning style, and 
learner interest 





preference 
4 (Ding et al., 2018) A New Learner Modelin | Basic Learner No learning theory 
Adaptive Learning System information, the learners | indicated 


learning style, the 
cognitive abilities of the 
learner and the prior 
knowledge state of the 
learner 




















2.5 AI techniques applied in adaptivity in e-learning systems 


AI Tools are seen to be appropriate tools to model learners as they exhibit the abilities 
of replicating human decision-making process. Some of the AI techniques that have been used for 
constructing learner models include; fuzzy logic, neural networks, Bayesian networks, and hidden 
Markov models (Almohammadi & Hagras, 2013a). AI techniques have been used in two ways; one 
is for classifying learners into groups to provide adaptation to those particular groups, two is for 
diagnosing the learner characteristics as learners learn so as to adjust the instruction method. 


Fuzzy logic is seen as an extension of set theory, Fuzzy logic is usually used to assess 
learning and knowledge of the learner. It has been used in several studies to make adaptation 
based on learner’s knowledge. Almohammadi and Hagras (2013 a) used fuzzy logic to extract rules 
from learner data so that they could tell the knowledge needs of learners. Aajli and Afdel (2017) 
use fuzzy logic to automatically generate the domain model of the adaptive e-learning systems. 


Bayesian networks are directed acyclic graphs which are usually used for modelling 
variables probabilistic dependencies (Liu et al., 2006). Bayesian networks have been used in 
adaptive systems in order to provide adaptive instruction. For instance, Liu et al. (2006) use 
Bayesian networks to assess the learner knowledge and provide instruction as per the learner 
knowledge; Firte et al. (2009) use Bayesian network to classify users based on their navigation 
habits and then suggest content based on the classification; Guan et al. (2013) use Bayesian 
network to provide learning path adaptability by first constructing the domain module using a 
Bayesian network; Ueno and Okamoto (2007) use Bayesian network to provide motivational 
messages based on the learner logs. 


Hidden Markov models have been used in adaptive e-learning systems. For instance, 
Deeb et al. (2014) used the K-means algorithm together with the Hidden Markov models to cluster 
learners into different learning styles and adapt content to suit the learner learning style; Rani et 
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al. (2017) used fuzzy petri nets and hidden Markov model to adapt learning content to each learner 
in accordance with the learner’s learning path. 


2.6 Conceptual framework 
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Figure 1. Conceptual framework 

















3. Methodology 


This research adopted iterative incremental methodology. This is a time-based 
stepwise software development process and each step defines a definitive block that keeps 
expanding the model. It begins with initializing the specification to create a basic model. From the 
initial complete model, user testing process is carried out which gives the user feedback which 
informs need for specification adjustments and model incremental expansion. The process is 
repeated till the model becomes functionally complete and acceptable application meeting all 
requirements put forth by the project. See Figure 2. 
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Iterative Incremental Software Development Methodology 
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Figure 2. Iterative and incremental methodology 


The project feedback is received after each iteration is completed. 


3.1 Model architecture 


The Reinforcement Learning Approach for Adaptive E-learning Using Multiple 
Learner Characteristics (RELUMECEL) Model Framework gathers learner characteristics, 
learning style, affective state and prior knowledge the give recommendations on the instruction 
design of contents best suited for individual learner based on the three characteristics. It also gives 
the best learning path for a learner revisiting the e-learning environment as well as giving the 
content developers the required updates needed so as to achieve adaptability for various learners 
represented in the given learning environment. 
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Figure 3. RELUMECEL Model architecture 


3.1.2 The learner domain 


RELUMECEL has a module for collecting the learners profile information. As the 
learner interacts with the e-learning environment, the following information is collected: user id, 
username, full names, email address, date of birth, course taking. The learner profile information 
is used for tracking the learner and delivering the required information and modeling the 
adaptivity based on the learner profile. This domain will be updated further with information from 
the feature extraction domain. 
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The Learner Domain 


The learner domain gathers basic information about the learner, gets updated by the feature extraction domain then passes is information for adaptavity in Relumucel 
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Figure 4. The learner domain 


3.1.3 The feature extraction domain 
Extraction of the learners learning style 


The application of learning styles in e-learning environment setups, reinforces and 
enhances the learner’s experiences by making the content retainable in the most effective and 
realistic manner and form. Implementation of learning style as learner characteristic in adaptive 
e-learning environment allows the acquisition of skills, knowledge and attitudes by the learner 
through the study or experience of the learner by their learning style preference. 


We use the latest version of VARK questionnaire for the setup. VARK was developed 
Fleming. We use it to determine the Learning style of the learner. create a module and incorporate 
the questions of VARK in this module of the VARK questionnaire and its analysis responses as 
developed by VARK. The VARK questionnaire is incorporated in RELUMECEL as application 
module. The modules analyze the response of the learner and determines the learners learning 
style. 


The figure below shows the application module with VARK Questionnaire. 
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Figure 5. VARK questionnaire module 
The learner at the initial interaction with the e-learning platform, is taken through the 
learning style module and answer the 16 questions depicted in the questionnaire as given by 
Fleming which basically asks the learner to reveal the way learner likes to learn. And with this the 
model will provide analysis of the given learner and give its learning style using VARK database 
developed by Fleming. The scores are used in RELUMECEL Engine to give further analysis 
together with the other learner characteristics 


Affective states extraction 


RELUMECEL focuses on, extraction of the affective state and “modelling the affective 
state.” Modelling of the affective state is contextualized to the e-learning environment and the 
measure is in relation to various learning styles modelled. The extraction process is initiated by 
developing a model from existing natural language processing libraries, identification of the 
dataset to be used, preparing the dataset, dividing the dataset into training and test, identifying 
the best classification algorithms and finally experimenting with the best training algorithms for 
the best possible results. Once the model is built and tested its incorporated in the RELUMECEL 
environment so that it can be used for extraction of the learner affective state during his/her 
interaction with the e-learning environment 


We used the ISEAR data which is an authentic for seven emotional attributes; fear, 
anger, disgust, joy. 


The prior knowledge extraction 


The measure of level of learner’s knowledge in a particular field of study is very crucial 
in assisted adaptability for the learning path to be taken (van Riesen et al., 2018). Once the learner 
logs into the system and selects the subject and the topic he/she wants to take, he will be taken 
through the test questions of the subject, then he will be guided through to the next course of 
action and the outcome measured and the reward given based on the nature of the outcome. 


The information resultant from the prior knowledge extraction process is kept as a log 
and fed into adaptation module to be used later for adaptability processing. The extraction of prior 
knowledge is further extended later; as later seen in adaptation module. It forms the basis of 
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determining a state a learner is at and the type of action he should proceed to take to gain 
maximum reward. It will also determine the where to explore more on the environment or just 
exploit learning by greedily picking on the next action to maximize on the rewards. 


The Questions for prior knowledge extraction are based are aligned with the course 
being taken, learning objectives and other instruction design requirements that are in tandem with 
both learning theories and the learner characteristics being studied for adaptation. 


3.1.4 The adaptivity domain 


Once the RELUMECEL model has extracted information; learner’s affective state, 
learning style and the prior knowledge, this is used as input to reinforcement learning model which 
is the core of adaptivity domain. 


In Reinforcement learning, learning is a natural phenomenon that results from the 
interaction of an agent with its environment (Sutton, 2018). The environment domain consists of 
states and actions. The interaction of the agent with the environment is specific and strategic so 
aS maximize some rewards apportioned during the learning process. Situations are mapped into 
actions similar to other forms of learning. In reinforcement learning, the argent/learner discovers 
the best action to take in any given situation within the parameters of the environment. The agent 
must proactively sense the environment, choose the best action in a given state within the 
environment among the available actions that maximizes the reward function. With the best action 
taken, the agent state is updated and it acquires a new state. 


From Figure 7, we visualize a general reinforcement learning architecture. A given 
reinforcement learning environment has got features which defines it; State S, time t and state at 
a given time S, . A given state has value which is dependent on immediate reward R at t giving R, . 


State: Sit) + S(t+t) 





Figure 6. A reinforcement learning 
To implement our reinforcement learning, we will explore the Q-learning algorithm. 
Q-learning algorithm 


According to Balasubramanian Velusamy (2013), Q-learning algorithm (Watkins, 
1992) is model-free reinforcement learning that is focused in finding the optimal policy of a given 
Markov Decision Process (MDP). A Markov decision process is a 5-tuple (S,A,Pa,Ra,y :) where 


S is a finite set of states, 
Ais a finite set of actions or As is set of actions from s) 
p(s’,rls,a) = Pr {Si41 =s’,Reyy =r |S =s,Ay = ah 


is probability that action a in state s at time t will lead to state s’at time t + 1, in case 
of deterministic case we have 6(s;,at) = St+1 , Ra(s,s’) is the expected immediate reward 
received after transitioning from state s to state s’, due to action a, y € [0,1) is the discount factor, 
which represents the difference in importance between future rewards and present rewards. 
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In a given [problem domain the agent strives to maximize the total reward as it 
transitions from one state to another. The Q-function which is a generalization of Q learning 
calculates the best combination of every state and action that will maximize the reward. 


Q function will return a fixed value at start point of the processing, as it goes through 
the transition new values get computed as the agents rewarded and thus a Q-table is updated by 
these new values. 


Q- function is denoted by 
Q(Sp ar) — Q(Sp at) + a [ress + ymaxQ (St41,a) — Q(Sp, a,)| 
where 
t- Present or Current state 
t+ 1- the Next state 
Q (s;, az) - the Q — values for the current state 
R (St, at) - Reward after performing action at in St 
a- The rate of learning (0 < a <1) 


y - Discount factor deciding the significance of the future and upcoming possible rewards (0 < a 
<1) 


3.2 Implementation and discussion 


The reinforcement learning architecture, begins with the learner’s characteristics 
having been extracted from various modules of the model. These are stored as part of the learner 
logs which are used for various computations which inform reinforcement and hence learning 
path. 


The lessons are designed and generated using a specific instruction model which aligns 
the lesson to the learning theories which addresses the specific characteristics and finally bringing 
out the adaptability. The learners with go through the guided learning process gets to attend the 
online lessons, do the assignments, tests and submits where necessary. The model detects the 
learner interactions and chooses for the learner the best paths, through actions at given times 
within specific states. This is done so that the learner can get maximum possible reward based on 
the state-action space. This is repeated in case the learner continually until a given best path is 
determine based on given learner combined learner characteristics measurements. 


A learner visiting the system for a second or more time will have their information 
retrieved and the adaptability given. This will apply also to new learners with similarity in their 
learner characteristics 


Table 4 below shows a lesson plan indicating module for topic “Object oriented 
programming using C++.” 
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Table 4. The learning modules for a given lesson table 




















UNIT-One Overview of C++: 

Description Object Oriented paradigms, Data abstraction/control 
abstraction, OOPS principles, Origin of C++, Sample C++ 
program, dynamic initialization of variables, new and delete 
operators, C++ keywords, General form of C++ program, 
Type casting, Introducing C++ classes, Difference between 
class and structure. 

Assessment T1-Exercise 1 
T2- Exercise 2 
T3- Exercise 3 
Assignment 1 

Expected from the 

student 








In this reinforcement learning model for adaptive e- learning, following the prescribed 
instruction model, states seS to be considered include taking Lessons, reading extra-material, 
solving exercises, going through questions and answers, waiting for answer, waiting for results, 
assignments, and assessments, Discussion, understanding and explanation. The actions aeA to be 
considered include read/study, read more, study extra material, solve exercises, submit exercise, 
ask where doubt, perform tests, discuss, giving up, Questions and Answers, for more 
understanding then do assignment submission and finally complete the learning by exiting the 


system or logging out. 


We assign the rewards values between 0 to ten and apportion the as in the algorithm 


below. 
Begin 
Do test 


If performance>=3 and performance <=5 


Provide with low level exercise 


Else if performance>=6 and performance <=8 


Provide with medium level exercise 


Else if performance>=9 and performance <=10 


Provide with high level exercise 


Else if performance>=o and performance <=2 


Avail to extra learning materials to study 


End if 


end 


Figure 7 below shows the representation. 
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Figure 7. The diagram for state-action-reward of a single lesson 


The content for e-learning is presented based on a given designer with the variables 
being, the title subtitles activities at each stage, timeline of each stage, the intended outcomes and 
the program structure. The planner assessment is based on ACM\IEEE curriculum 
recommendations. We built an instructional environment and resources which is dynamic and 
encompasses adaptability otherwise known as an adaptable instruction design based on 
environment and resources. 


As a put out in Schott (2015), designing an instructional follows theoretical and 
practical research in the fields of cognition, education psychology and problem solving techniques. 
The strategies used in instruction design enhances the creation of guidelines for best practices in 
all aspects of the instruction process which include; planning and management of e-learning 
instruction method, delivery techniques, learner assessment and evaluation and feedback 
methods. The fundamentals of the theory are to produce measurable changes in learners’ cognitive 
skills and attitudes. This calls for construction of lessons to achieve the intended objectives which 
then inform the creation of course plans. 


There are a number of instruction models including analysis, design, development, 
implementation, and evaluation (ADDIE) model, ASSURE model, Dick And Carey Model, 4CD/ID 
model (Khalil & Elkhider, 2020). The models are formed to implement all or a at least one learning 
theory. Learning Theories (Cognitive & Processing, 2018) are defined as an organized set of 
principles explaining how individuals acquire, retain, and recall knowledge and they include 
Behaviorism, Cognitive Information Processing (Cognitivism) and Constructivism. 


In this research the designer section explores ASSURE instruction model to design the 
dynamic courses and is keen on bringing out all the elements of Behaviorism learning theory. 
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The ASSURE Model can be seen below. 


e ANALYZE LEARNER CHARACTERISTICS 


¢ STATE OBJECTIVES 


e SELECT, MODIFY ,OR DESIGN MATERIALS 


e UTILIZE MATERIALS 


¢ REQUIRE LEARNER RESPONSE 


* EVALUATION 





Figure 8. The ASSURE instructional model 


As shown in Figure 8, ASSURE model is an acronym for the steps followed in the 
model; Analyze Learner Characteristics, State objectives, select/modify/Design Materials, utilize 
materials, Require Learner Response and Evaluation. In this research we are extracting multiple 
learner characteristics to give model e-learning environment and give adaptation to learners. 
ASSURE model is therefore ideal for our design purposes and as indicated by Sundayana et al. 
(2017) is well suited for Problem based and discovery learning. 


Our environment states consist of lessons, exercises, assignments, assessments, exams 
and actions consists of, study, study extra materials, do assignments, perform test, submit 
assignments and others depending on the course composition. We have a wide state action space 
to be considered. We assume that the agent in this environment is also influence by the different 
learning characteristics of the learner/agent. 


With the optimal policy calculated based on learner characteristics and given 
instruction design presented, and with the logs of learner profile, the model will then provide 
adaptability per learner based on learning characteristics of the learner. 


4. Evaluations and experimental results 


In using simulation of q-learning algorithm use simulation to get to help in varying the 
agent parameters especially the learning characteristics. Figure 5.1 shows the initial graph. 
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Figure 9. The environment network 


We then defined the reward system which set at the maxim of 100 and if the learning 
takes place smoothly throughout the iterations, then Max reward is 100 as shown in the matrix 
generated in Figure 10. This Matrix is also the initial body of the initialized probabilities. 


The initial matrix: 





[[-1. 0. -1. -1. -1. -1. -1. -1. -1. -1. -1.] [-1. -1. -1. O. O. -1. -1. -1. -1. -1.100.] 
[ 0. -1. 0. O. O. O. -1. -1. -1. -1. -1.] [ -1. -1. -1. -1. O. O. -1. -1. -1. -1.100.] 
[-1. O. -1. -1. -1. -1. -1. -1. -1. 0. -1.] [ -1. -1. -1. -1. -1. O. -1. -1. -1. -1.100.] 
[-1. O. -1. -1. 0. -1. O. -1. -1. -1. -1.] [-1. -1. O. -1. -1. -1. -1. -1. -1. -1. -1.] 
[-1. O. -1. 0. -1. 0. O. O. -1. -1. -1.] [ -1. -1. -1. -1. -1. -1. 0. O. O. -1.100.]] 
[-1. O. -1. -1. 0. -1. -1. 0. O. -1. -1.] 














Figure 10. 


With reinforcement value set at .75 we get matrix as shown and 1000 simulations of 
Iterations we are able to generate the matrix of the best learning path and provide determine the 
necessary reinforcement as shown in Figure 10. 
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Most efficient path: 


[0, 1, 4, 7, 10] 





3000 4 


2500 7 


2000 7 


1500 4 


Reward gained 


1000 4 











T 
400 


T 
600 


No of iterations 


Figure 11. 


T 
800 


r 
1000 


Path with learning characteristics define 














Figure 12. 


The remaining are results of different simulations and the resultant matrix on varies 


possible paths. 
Most efficient path: 

(0, 1, 5, 8, 10] 
Excellent Followed 
[[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 25. 0. 
[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 18. 0. 
[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 32. 0. 
[0. 0. 0. 0. 0. 47. 0. 
[0. 0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 0. 

Figure 13. 

Fail Followed 

[[0. 0. 0. 0. 0. 0. 
[0. 0.19. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0. 0. 0. 0. 0. 
[0. 0.99. 0. 0. 0 
[0. 0. 0. 0. 0. 0. 

Figure 1 


0. 
0 
0. 
0. 
0. 
0. 0. 
0. 
0. 
0. 
. 0. 
0. 
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0 
0 
0 
0 
0. 
0 
0 
0 
0 
0 


0. 
6. 


a 
ae 


aie ca 
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Good Followed 

[L0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0.19. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0.22. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0. 0. 0. 0.15. 0. 0. 0.] 
[0. 0. 0. 0.27. 0. 0.26. 0. 0. 0.] 
[0. 0. 0. 0.36. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0.32. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 
[0. 0. 0. 0. 0. 0. 0. 23. 0. 0. 0.]] 

Figure 14. 


6. Conclusion and future work 


SSeeeSseeaecas 
eeecooocesoos 


Pass Followed 
[L 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 
[ 0. 


0. 0. 0. 0. 0. 0. 0. 0.) 
14. 0. 0. 0. 0. 0. 0. 0.) 
0. 0. 0. 0. 0. 0. 0. 0.) 
0. 0. 0.30. 0. 0. 0. 0.) 
12. 0. 0.19. 0. 0. 0. 0.) 
0. 0. 0. 0. 0. 0. 0. 0.] 
27. 0. 0. 0. 0. 0. 0. 0.) 
0. 0. 0. 0. 0. 0. 0. 0.) 
0. 0. 0. 0. 0. 0. 0. 0.] 
0. 0. 0. 0. 0. 0. 0. 0.] 
0. 0. 0.18. 0. 0. 0. 0.1] 
Figure 15. 


This work presents enhanced approach to infusing adaptability to learning 
management systems by looking into three learning characteristics; learning style, prior 
knowledge and affective state. In this research we have created an adaptability based on learner 
characteristics and using reinforcement learning technology, we studied various processes which 
can be used to extract these characteristics. 
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We implemented reinforcement learning using Q-learning algorithm to bring out 
adaptability. We have not exhausted all the learner characteristics and therefore we propose that 
in future work the research can be extended to exhaust learner characteristics. 
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